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Abstract 

It is significant to apply load-balancing strategy to improve the performance and reliability of resource in 
data centers. One of the challenging scheduling problems in Cloud data centers is to take the allocation 
and migration of reconfigurable virtual machines (VMs) as well as the integrated features of hosting 
physical machines (PMs) into consideration. In the reservation model, the workload of data centers 
has fixed process interval characteristics. In general, load-balance scheduling is NP-hard problem as 
proved in many open literatures. Traditionally, for offline load balance without migration, one of the 
best approaches is LPT (Longest Process Time first), which is well known to have approximation ratio 
4/3. With virtualization, reactive (post) migration of VMs after allocation is one popular way for load 
balance and traffic consolidation. However, reactive migration has difficulty to reach predefined load 
balance objectives, and may cause interruption and instability of service and other associated costs. In 
view of this, we propose a new paradigm, called Prepartition, it proactively sets process-time bound for 
each request on each PM and prepares in advance to migrate VMs to achieve the predefined balance 
goal. Prepartition can reduce process time by preparing VM migration in advance and therefore reduce 
instability and achieve better load balance as desired. Trace-driven and synthetic simulation results 
show that Prepartition for offline scheduling has 10%-20% better performance than the well known 
load balancing algorithms with regard to average utilization, imbalance degree, makespan as well as 
capacity_makespan. We also apply the Prepartition to online (PrepartitionOn) load balance and compare 
it with existing online scheduling algorithms, in which PrepartitionOn can improve 8%-20% performance 
with regard to average CPU utilization, imbalance degree, makespan as well as capacity_makespan. Both 
theoretical and experimental results are provided. 

Keywords: Cloud Computing, Physical Machines (PMs), Virtual Machines (VMs), Reservation Model, 
Load Balance Scheduling 


1. Introduction 

In traditional data centers, applications are tied 
to specific physical servers that are often over¬ 
provisioned to deal with upper-bound workload. 
Such configuration makes data centers expensive 
to maintain with wasted energy and floor space, 
low resource utilization and significant manage¬ 
ment overhead. With virtualization technology, 
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today’s Cloud data centers become more flexible, 
secure and provide better support for on-demand 
allocating. The definition and model defined by 
this paper are aimed to be general enough to be 
used by a variety of Cloud providers and focus on 
the Infrastructure as a Service (IaaS). Cloud data¬ 
centers can be a distributed network in structure, 
containing many compute nodes (such as servers), 
storage nodes, and network devices. Each node is 
formed by a series of resources such as CPU, mem¬ 
ory, and network bandwidth and so on, which are 
called multi-dimensional resources; each has its 
corresponding properties. Under virtualization, 
Cloud data centers should have ability to migrate 
an application from one set of resources to an¬ 
other in a non-disruptive manner. Such ability is 
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essential in modern cloud computing infrastruc¬ 
ture that aims to efficiently share and manage ex¬ 
tremely large data centers. Reactive migration 
of VMs is widely proposed for load balance and 
traffic consolidation. 

One key technology playing an important role 
in Cloud data centers is load balance schedul¬ 
ing. There are quite many load balance schedul¬ 
ing algorithms. Most of them are for traditional 
web servers but do not consider VM reserva¬ 
tions with lifecycle characteristics. One of the 
challenging scheduling problems in Cloud data 
centers is to consider allocation and migration 
of reconfigurable VMs and integrated features 
of hosting PMs. The load balance problem for 
VM reservations considering lifecycle is as fol¬ 
lows: given a set of m identical machines (PMs) 
PMi , PM 2 ,..., PM m and a set of n requests 
(VMs), each request [si, fa, dj\ , has a start-time 
(s^, end-time (fa) constraint and a capacity de¬ 
mand (di) from a PM, the objective of load bal¬ 
ance is to assign each request to one of PMs so 
that the loads placed on all machines are balanced 
or the maximum load is minimized. This problem 
is not well studied yet in the open literatures. The 
major contributions of this paper are: 

• Providing a modeling approach to VM 
reservation scheduling with capacity sharing 
by modifying traditional interval scheduling 
problem and considering life cycles charac¬ 
teristics of both VMs and PMs. 

• Designing and implementing load balancing 
scheduling algorithms, called Prepartition for 
both offline and online scheduling which can 
prepare migration in advance and set process 
time bound for each VM on a PM. 

• Deriving computational complexity and 
quality analysis for both offline and online 
Prepartition. 

• Providing performance evaluation of multi¬ 
ple metrics such as average utilization, im¬ 
balance degree, makespan, time costs as well 
as capacity_makespan by simulating different 
algorithms using trace-driven and synthetic 
data. 

The remaining parts of this paper are organized 
as follows: Section 2 discusses the related work 
on load balance algorithms. Section 3 introduces 
problem formulation. Section 4 presents Preparti¬ 
tion algorithm in details as well as offline and on¬ 
line algorithms are described and compared. Per¬ 
formance evaluations of different scheduling algo¬ 
rithms are shown in section 5. Finally in section 
6, a conclusion is given. 


2. Related works 

A large amount of work has been devoted to the 
schedule algorithms and can be mainly divided 
into two types: online load balance algorithms 
and offline ones. The major difference lies in that 
online schedulers only know current request and 
status of all PMs but offline schedulers know all 
the requests and status of all PMs. 

Andre et al.pQ discussed the detailed design of 
a data center. Armbrust et al.[2J summarized the 
key issues and solutions in Cloud computing. Fos¬ 
ter et al.pi provided detailed comparison between 
Cloud computing and Grid computing. Buyya et 
al.[3] introduced a way to model and simulated 
Cloud computing environments. Wickremasinghe 
et al. m introduced three general scheduling al¬ 
gorithms for Cloud computing and provided sim¬ 
ulation results. Wood et al. [22] introduced tech¬ 
niques for virtual machine migration with spots 
and proposed a few reactive migration algorithms. 
Zhang [23 compared major load balance schedul¬ 
ing algorithms for traditional Web servers. Singh 
et al. m proposed a novel load balance algorithm 
called VectorDot which deals with hierarchical 
and multi-dimensional resources constraints by 
considering both servers and storage in a Cloud. 
Arzuaga et al.[3] proposed a quantifying measure 
of load imbalance on virtualized enterprise servers 
considering reactive live VM migrations. Gal¬ 
loway et al. in [7] introduced an online greedy 
algorithm, in which PMs can be dynamic turned 
on and off but the life cycle of a VM is not con¬ 
sidered. Gulati et al [9] presented challenge issues 
and Distributed Resource Scheduling (DRS) as a 
load balance scheduling for Cloud-scale resource 
management in VMware. 

Tian et al. m provided a comparative study 
of major existing scheduling strategies and algo¬ 
rithms for Cloud data centers. Sun et al.[15] pre¬ 
sented a novel heuristic algorithm to improve in¬ 
tegrated utilization considering multi-dimensional 
resource. Tian et al. m designed a toolkit for 
modeling and simulating VM allocation, [T8 ] [T9 ] 
introduced a dynamic load balance scheduling al¬ 
gorithm considering only current allocation pe¬ 
riod and multi-dimensional resource but without 
considering life-cycles of both VMs and PMs. Li 
et al. m proposed a cloud task scheduling pol¬ 
icy based on ant colony optimization algorithm 
to balance the entire system and minimize the 
makespan of a given task set. Hu et al. m stated 
an algorithm named Genetic, which can calculate 
the history data and current states to choose an 
allocation. 

Most of existing research does not con¬ 
sider fixed interval constraints of VM alloca- 
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tion. Knauth et al. El introduced energy-efficient 
scheduling algorithms applying timed instances 
that have a priori specified reservation time of 
fixed length, these assumptions are also adopted 
in this paper. Most of existing research considers 
reactive VM migrations as a mean for load bal¬ 
ance in data centers. To the best of our knowl¬ 
edge, proactive VM migration by pre-partition 
has not been studied yet in the open literatures. 
It is one of major objectives in this paper. 

3. Problem Formulation 

3.1. Problem description and formulation 

In this paper we consider VMs reservation 
and model the VM allocations as a modified 
interval scheduling problem (MISP) with fixed 
processing time. More explanation and analysis 
about traditional interval scheduling problems 
with fixed processing time can be found in [T2] 
and references there in. We present a general for¬ 
mulation of modified interval-scheduling problem 
and evaluate its results compared to well-known 
existing algorithms. There are following assump¬ 
tions: 

1) All data are deterministic and unless other¬ 
wise specified, the time is formatted in slotted 
windows, we partition the total time period [0, 
T] into slots with equal length (s 0 ), the total 
number of slots is k=T/so. The start time Si and 
finish time fi are integer numbers of one slot. 
Then the interval of a request can be represented 
in slot format with (start-time, finish-time). For 
example, if so=5 minutes, an interval (3, 10) 
means that it has start time and finish time 
at the 3rd-slot and lOth-slot respectively. The 
actual duration of this request is (10-3) x 5=35 
minutes. 

2) For all VM reservations, there are no prece¬ 
dence constraints other than those implied by 
the start-time and finish-time. 

3) The required capacity of each request is a 
positive real number between (0,1]. Notice that 
the capacity of a single physical machine is 
normalized to be 1 and the required capacity of 
a VM can be 1/8, 1/4 or 1/2 or other portions 
of the total capacity of a PM. This is consistent 
with widely adopted practice in Amazon EC2 
m and nu. 

A few key definitions are explained as follows: 

Definition 1. Traditional interval scheduling 
problem (TISP) with fixed processing time : A set 
of requests {1, 2 ,..., n} where the i-th request 
corresponds to an interval of time starting at Si 


and finishing at /^, each request needs a capac¬ 
ity of 1, i.e. occupying the whole capacity of a 
machine during fixed processing time. 

Definition 2. Interval scheduling with capacity 
sharing (ISWCS): The only difference from TISP 
is that a resource (to be concrete, a PM) can be 
shared by different requests if the total capacity 
of all requests allocated on the single resource at 
any time does not surpass the total capacity that 
the resource can provide. 

Definition 3. Sharing compatible intervals for 
ISWCS: A subset of intervals with total required 
capacity not surpass the total capacity of a PM 
at any time, therefore they can share the capac¬ 
ity of a PM. In the literature, the makespan is 
used to measure the load balance, which is sim¬ 
ply the maximum total load (processing time) on 
any machine. Traditionally, the makespan is the 
total length of the schedule. 

In view of the problem in ISWCS for 
VM scheduling, we redefine the makespan as 
capacity _makespan. 

Definition 4- Capacity.makespan of a PM i: In 
any allocation of VM requests to PMs, let A(i) 
denote the set of VM requests allocated to ma¬ 
chine PM{. Under this allocation, machine PMi 
will have total load equal to the sum of product 
of each required capacity and its duration (called 
Capacity .makespan, i.e., CM for abbreviation in 
this paper), as follows: 

CM i= djtj (1) 

jeA(i) 

where dj is the capacity requests of VMj from a 
PM and tj is the span of request j (i.e., the length 
of processing time of request j). 

Therefore, the goal of load balancing is to min¬ 
imize the maximum load (capacity.makespan) on 
any PM. Some other related metrics such as aver¬ 
age utilization and makespan are also considered 
and will be explained in the following section. As¬ 
suming there are m PMs in data centers, the prob¬ 
lem of ISWCS load balance in it therefore can be 
formulated as: 

Minx<i<m CMi ( 2 ) 

subject to 1). V slot s, E d i ^ 1 ( 3 ) 

VMjEPMi 

2). V j, Sj and ej are fixed by reservation. 

(4) 

where dj is the capacity requirement of VM j 
and the total capacity of a PM i is normalized 
to 1. The condition 1) shows the sharing capac¬ 
ity constraint and condition, 2) is for the interval 
constraint of VM reservations. 
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Theorem 1: The offline scheduling problem of 
finding an allocation of minimizing the makespan 
in general case is NP-complete. 

The proof can be found in [19] and is omitted 
here. 


3.2. Metrics for ISWCS load balancing algorithm 

In this section, a few metrics closely related to 
ISWCS load balance problem will be presented. 
Some other metrics can be found in [T9] . 

1) PM resource: 

PMi(i, PCPUi, PMeirii, PStof), i is the index 
number of PM, PCPUi,PMenii,PStOi are the 
CPU, memory, storage capacity of that a PM can 
provide. 

2) VM resource: 

VCPUj , VMerrij , VSt 0j ,Tf art , Tf nd ), j 
is the VM type ID, VCPUj,VMeirij,VStoj are 
the CPU, memory, storage requirements of VM j, 
restart^jpend are ^he s t ar t time and end time, 

which are used to represent the life cycle of a VM. 

3) Time slots: we consider a time span 

from 0 to T be divided into slots with same 
length. The n slots can be defined as 
[(t 0 , ti), (ti,t 2 ), ..., (Ui-Mn)], each time slot T k 
means the time span (t k -i,t k )- 

4) Average CPU utilization of PM{ during slot 0 
and T n is defined as: 


pcpuY 


ELo (POPUP X T k ) 

E n rp 

k =0 & 


( 5 ) 


where PCPUp is the average CPU utilization 
during slot T k . Average memory utilization 
(PMemf) and storage utilization ( PStof ) of 
both PMs can be computed in the same way. Sim¬ 
ilarly, average CPU (memory and storage) utiliza¬ 
tion of a VM can be computed. 

5) Makespan: the total length of a schedule for 
a set of VM reservations, i. e., the difference be¬ 
tween the start-time of the first job and the fin¬ 
ishing time of the last job. 

6) The capacity_makespan (CM) of all PMs: can 
be formulated as: 


CM = max (CMf) (6) 

i 


From these equations, we notice that life cycle and 
capacity sharing are two major differences from 
traditional metrics such as makespan which only 
considers process time (duration). Traditionally 
Longest Process Time first (LPT) |8| is widely 
used for load balance of offline multi-processor 
scheduling. Reactive (post) migration of VMs is 
another popular way of load-balancing. However, 


reactive migration has difficulty to reach prede¬ 
fined load balance objectives, and may cause in¬ 
terruption and instability of service and other as¬ 
sociated costs. By considering both fixed process 
intervals and capacity sharing properties in Cloud 
data centers, we propose new offline and online al¬ 
gorithms as follows. 


4. Prepartition Algorithm 

4.1. Offline Prepartition Algorithm 

For a given set of VM reservations, let us con¬ 
sider there are m PMs in a data center and denote 
OPT as the optimal solution for a given set of J 
VM reservations. Firstly define 

1 J 

Po = maxfmaxl^CMj, — CMA < OPT 
J m 

3 = 1 

( 7 ) 

Po is a lower bound on OPT. Algorithm 4.1 shows 
the pseudocodes of Prepartition algorithm. The 
algorithm firstly computes balance value by equa¬ 
tion (7), defines partition value ( k ) and finds the 
length of each partition (i.e. [Po/£T|, which is 

the max time length a VM can continously run 
on a PM). For each request, Preparition equally 
partitions it into multiple \Po/k] subintervals if 
its CM is larger than [P 0 /£f|, and then finds a 
PM with the lowest average capacity_makespan 
and available capacity, and updates the load on 
each PM. After all requests are allocated, the al¬ 
gorithm computes the capacity_makespan of each 
PM and finds total partition (migration) num¬ 
bers. For practice, the scheduler has to record 
all possible subintervals and their hosting PMs of 
each request so that migrations of VMs can be 
conducted in advance to reduce overheads. 

Theorem 2: The computational complexity of 
Prepartition algorithm is 0{nlogm) using priority 
queue data structure where n is the number of VM 
requests after pre-partition and m is total number 
of PMs used. 

Proof: The priority queue is designed such that 
each element (PM) has a priority value (aver¬ 
age capacity_makespan), and each time the al¬ 
gorithm needs to select an element from it, the 
algorithm takes the one with the highest prior¬ 
ity (the smaller average capacity-makespan value 
is, the higher priority it is). Sorting n numbers 
in a priority queue takes 0(n ) time and a prior¬ 
ity queue performs insertion and the extraction 
of minima in Oifogn) steps (detailed proof of the 
priority queue is shown in D2D- Therefore, by us¬ 
ing priority queue or related data structure, the 
algorithm can find a PM with the lowest average 
capacity-makespan in 0(logm) time. Altogether, 
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Input: VM requests indicated by their 

(required VM type IDs, start times, 
ending times, requested capacity), 

CMi is the capacity_makespan of 
request i 

Output: Assign a PM ID to all requests and 
their partitions 

Initialization: computing the bound Pq value 
and set the partition value k ; 

2 if CMi > Po then 

3 | divide it by \Po/k~\ subintervals equally 
and consider each subinterval as a new 
request 

4 end 

5 Sort all intervals in decreasing order of CMs , 
break ties arbitrarity; 

|e Let ii, / 2 ,..., I n denote the intervals in this 
order; 

7 forall the j from / to n do 

8 | Pick up the VM with the earliest start 
time in the VM queue for execution; 
Allocate j to the PM with the lowest load 
and available capacity; 

Upload load (CM) of the PM; 

end 

I 3 Compute CM of each PM and total partitions 


Algorithm 4.1: The pseudo codes of Offline 
Prepartition algorithm 


for n requests, Prepartition algorithm has time 
complexity 0(nlogm). 

Theorem 3: The approximation ratio of 
Prepartition algorithm is (1 + e) regarding the 
capacity.makespan where e=\ and k is the 
partition value (a preset constant). 

Proof: This is because that each request has 
bounded capacity _makespan by pre-parition 
based on ideal lower bound Po. We sketch the 
proof as follows. Each job has start-time s*, 
end-time fi and process time Pi=fi~Si. Con¬ 
sider the last job to finish (after scheduling 
all other jobs) and suppose this job starts at 
time To. All the machines must have been fully 
loaded up to capacity_makespan CM 0 , which 
gives CMq <OPT. Since, for all jobs, we have 
CMi < e OPT (by the settting of Prepartition 
algorithm in equation (7)), this job finishes with 
load CMo+eOPT. Hence, the schedule with 
capacity_makespancan be no more than CMo+e 
OPT < (l+e)OPT, this finishes the proof. 


4.2. Online Prepartition Algorithm 

For online VM allocations, scheduling deci¬ 
sions must be made without complete information 
about the entire job instances because jobs arrive 
one by one. We extend the offline Prepartition 
algorithm to online scenario as PrepartitionOn. 

Let us consider there are m PMs and L VMs 
(including the one just came) in a data center. 
Firstly define 


L 

B d = min\ max (CMj)/2,'S^(CMj)/m~\ (8) 

1 <j<l — 

3 = 1 

B d is called dynamic balance value, which is one 
half of the max capacity_makespan of all current 
PMs or the ideal load balance value of all cur¬ 
rent PMs in the system, where L is the num¬ 
ber of VMs requests already arrived. Notice that 
the reason to set B d as one half of the max 
capacity_makespan of all current PMs is to avoid 
large requests may cause imbalance in some cases. 

Algorithm 4.2 shows the pseudo codes of 
PrepartitionOn algorithm. Since in online algo¬ 
rithm, the requests come one by one, the sys¬ 
tem can only capture the information of arrived 
requests. When a new request comes into the 
system, the algorithm computes dynamic balance 
value by equation (8). To be noticed, L repre¬ 
sents the number of requests already arrived, and 
m represents the number of PMs in use. After the 
dynamic balanced value (B d ) is computed, then 
the initial request is partitioned into several re¬ 
quests (segments) based on the partition value k. 
In these partitioned requests, the first one would 
be executed instantly, which will be allocated to 
the PM with the lowest capacity_makespan, while 
others would be put back into the queue wait¬ 
ing to be executed. Then the algorithm picks 
up the next arrived request to follow the same 
partition and allocation process. After all re¬ 
quests are allocated, the algorithm computes the 
capacity_makespan of each PM and find the to¬ 
tal partition numbers for n requests. Since the 
number of partitions and segments of each VM 
request are known at the moment of allocation, 
the system can prepare VM migration in advance 
so that process time and instability of migration 
can be reduced. 

Theorem 4- The competitive ratio of Prepar¬ 
titionOn is (1 + | — ^) regarding the capac¬ 
ity _makespan. 

Proof: Without loss of generality, we label 
PMs in order of non-decreasing final loads 
in PrepartitionOn. Denote OPT and and 
PrepartitionOn(I) respectively as the op¬ 
timal load balance value of corresponding 
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Input: VM requests come one by one 
indicated by their information 
(required VM type IDs, start times, 
ending times, requested capacity), 
CMi is the capacity_makespan of 
request i 

Output: Assign a PM ID to all requests and 
their partitions 

Initialization: set the partition value k, total 
partition number P= 0; 
for each arrived job j do 

Pick up the VM with the earliest start 
time in the VM queue to schedule; 
Compute CMj of VM j, and Bd using 
Equ. (8); 

if CMj > \(Bd/k)] then 

partition VMj into multiple \(Bd/k)~\ 
subintervals equally, consider each 
subinterval as a new request and add 
them into VM queue, P = P + [ < ^/ 3 k \ 


else 

Allocate j to PM with the lowest 
load and available capacity; 
Update load (CM) of the PM; 

end 
end 
end 

Compute CM of each PM and output total 
number of partitions P; 


Algorithm 4.2: PrepartitionOn Algorithm 


offline scheduling and load balance value of 
PrepartitionOn for a given set of jobs /, re¬ 
spectively. Then the load of PM m defines 
the capacity_makespan. The first (ra-1) PMs 
each process a subset of the jobs and then 
experience a (possibly none) idle period. All 
PMs together finish a total capacity_makespan 
Y^i=i CMi during their busy periods. Consider 
the allocation of the last job jto PM m . By the 
scheduling rule of PrepartitionOn, PM m had the 
lowest load at the time of allocation. Hence, any 
idle period on the first (ra-1) PMs cannot be 
bigger than the capacity_makespan of the last 
job allocated on PM m and hence cannot exceed 
the maximum capacity_makespan divided by k 
(partition value), i.e., maXl <^ cm x . have 


, W , s maxi <,•<„ CMj 

mxPrepartitionOn(I) < 1 )-- 

s: k 

( 9 ) 


Table 1: 8 types of virtual machines (VMs) in Amazon 
EC2 


Compute Units 

Memory 

Storage 

VM Type 

1 units 

1.875GB 

211.25GB 

i-i(i) 

4 units 

7.5GB 

845GB 

1-2(2) 

8 units 

15GB 

1690GB 

1-3(3) 

6.5 units 

17.1GB 

422.5GB 

2-1(4) 

13 units 

34.2GB 

845GB 

2-2(5) 

26 units 

68.4GB 

1690GB 

2-3(6) 

5 units 

1.875GB 

422.5GB 

3-1(7) 

20 units 

7GB 

1690GB 

3-2(8) 


Table 2: 3 types of physical machines (PMs) suggested 


PM Pool Type 

Compute Units 

Memory 

Storage 

Type 1 

16 units 

30GB 

3380GB 

Type 2 

52 units 

136.8GB 

3380GB 

Type 3 

40 units 

14GB 

3380GB 


which is equivalent to 


V n CM 

PrepartitionOn(I) < ——-- 

m 

which is 


(m - 1) maxi<j< n CMi 
mk 

( 10 ) 


PrepartitionOn(I) < OPT + (\ - ^—)OPT 

rv TYlrv 

( 11 ) 

Note that — 1 is the lower bound on 

OPT(I) because the optimum capacity_makespan 
cannot be smaller than the average 
capacity _makespan on all PMs. And 

OPT(I) > maxi<K n since the largest 

job must be processed on a PM. We therefore 
have PrepartitionOn(I) < (1 + ^ — -^,)OPT. 

Theorem 5: The computational complexity of 
PrepartitionOn is 0(nlogm) using priority queue 
data structure, where n is the number of VM 
requests after pre-partition and m is the total 
number of PMs used. 

Proof: The proof is exactly the same as in the 
proof for Theorem 2, we therefore omit it. 


5. Performance Evaluation 

In this part, we will present the simulation re¬ 
sults between Prepartition algorithms and other 
existed algorithms. To achieve this goal, we used 
a Java simulator CloudSched (see Tian et al. 
ED- For simulation, to be realistic and rea¬ 
sonable, we adopt data both from Normal dis¬ 
tribution and Lawrence Livermore National Lab 
(LLNL) trace, see [26] for detailed introduction 
about the trace. 
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All simulations are conducted on a computer 
configured with Intel i5 processor at 2.5GHz 
and 4GB memory. All VM requests are gener¬ 
ated following Normal distribution. In offline al¬ 
gorithm comparisons, Round-Robin (RR) algo¬ 
rithm, Longest Process Time (LPT) algorithm 
and Post Migration Algorithm (PMG) are imple¬ 
mented. 

5.1. Offline Algorithm Performance Evaluation 

1) Round-Robin Algorithm (R-R): a traditional 
load balancing scheduling algorithm by allocating 
the VM requests in turn to each PM that can 
provide required resource. 

2) Longest Processing Time first (LPT): it sorts 
the VM requests by processing time in decreasing 
order firstly. Then allocating the requests in 
that order to the PM with the lowest load. In 
this paper, the lowest load means the lowest 
capacity_makespan of all PMs. 

3) Post Migration algorithm (PMG): Firstly, it 
processes the requests in the same way as LPT 
does. Then the average capacity_makespan of 
all jobs is calculated. The up-threshold and 
low-threshold of the capacity_makespan for the 
post migration are calculated through the average 
capacity_makespan multiplied by a factor (in this 
paper we set the factor as 0.1, so the up-threshold 
is average capacity_makespan multiplied by 1.1 
and the low-threshold is multiplied by 0.9). 
Off course the factor can be set dynamically 
to meet different requirements; however, the 
larger the factor is, the higher imbalance is. A 
migration list is formed by collecting the VMs 
taken from PMs with capacity_makespan higher 
than the low-threshold. The VMs would be 
taken from a PM only if the operation would 
not lead the capacity_makespan of the PM to 
be less than the low threshold. After that, the 
VMs in the migration list would be re-allocated 
to a PM with capacity_makespan less than the 
up-threshold. The VMs would be allocated 
to a new PM only if the operation would not 
lead the capacity_makespan of the PM to be 
higher than the up-threshold. There may be still 
some VMs left in the list, finally the algorithm 
allocates the left VMs to the PMs with the lowest 
capacity_makespan until the list is empty. 

In this paper, we adopt the Amazon EC2 con¬ 
figuration of VMs and PMs as shown in Table 
1 and 2. Note that one compute unit (CU) has 
equivalent CPU capacity of a 1.0-1.2 GHz 2007 
Opteron or 2007 Xeon processor m 

Observation 1. PMG is a best-effort trial 
heuristic for load balance. It does not guarantee a 
bounded or predefined load balance objective. This 


is validated in the following performance evalua¬ 
tion section. 

5.1.1. Replay with LLNL Data Trace 

As for realistic data, we adopt the log data at 
Lawrence Livermore National Lab (LLNL) [ 26] . 
The log contains months of records collected by a 
large Linux cluster and has characteristics con¬ 
sistent with our problem model. Each line of 
data in that log file includes 18 elements, while 
we only need the request-ID, start-time, duration 
and number of processors (capacity demands) in 
our simulation. We convert the units from sec¬ 
onds in LLNL log file into minutes, because we 
set 5 minutes as a time slot length mentioned in 
previous section. 

Fig.l and Fig. 2 show the average uti¬ 
lization, imbalance degree, makespan and 
capacity _makespan comparison for different 
algorithms with LLNL data trace. From these 
figures, we can notice that Prepartition algorithm 
has better performance than other algorithms in 
average utilization, imbalance degree, makespan, 
capacity_makespan. Prepartition algorithm has 
10%-20% higher average utilization than PMG 
and LPT, and 40%-50% higher average utiliza¬ 
tion than Random-Robin (RR). Prepartition 
algorithm has 10%-20% lower average makespan 
and capacity .makespan than PMG and LPT, 5% 
imbalance degree than LPT and 40%-50% lower 
average makespan and capacity .makespan than 
Random-Robin (RR). 

With the partition value k = 4, PMG algorithm 
has a quite similar value in imbalance value, so be¬ 
sides the above evaluations, we also vary the par¬ 
tition number k from 4, 8 to 10 to compare the 
imbalance degree affects. In Figure 3, we can no¬ 
tice that larger k value will induce a lower imbal¬ 
ance degree. Similarly, with a larger value, larger 
average utilization, lower makespan and capac¬ 
ity .makespan can be acquired. 

However, increasing the k value will bring 
side-effects. The dominant one is running time, 
in Fig. 4, we compare the time costs under dif¬ 
ferent partition value k , Prepartition algorithm 
with k = 8 costs about 10% more running time, 
and with k — 10 it costs 15% more running time 
than Prepartition algorithm with k = 4 on the 
average. It is easy to understand that a larger k 
value will produce a better load balance, which 
leads to more partitions, and more partitions 
need more time to proceed. 

Observation 2. Whatever numbers of migra¬ 
tions to taken, post migration algorithm (PMG) 
just cannot achieve the same level of average 
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Figure 1: The offline algorithm comparison of average utilization (a) and imbalance degree (b) with LLNL trace 



Figure 2: The offline algorithm comparison of makespan (c) and capacity_makespan (d) with LLNL trace 


Imbalance Degree Comparison 

0.06 


0.05 

0.04 

0.03 

0.02 

0.01 

0 


■ Prepartition(k=4) 
s® Prepartition(k=8) 
« Prepartition(k=10) 



100 VMs 200 VMs 400 VMs 


Figure 3: The comparison of imbalance value by varying 
k values 
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Figure 4: The comparison of time costs by varying k values 


utilization, makespan and capacity-makespan as 
Prepartition does. 

This is because that Prepartition works in a 
much more refined and desired scale by preparti¬ 


tion based on reservation data while PMG is just 
a best-effort trial by migration. 


5.1.2. Results Comparison by Synthetic Data 

We set 5 minutes as a slot, so 12 slots are for an 
hour, 288 slots are for a day. All requests satisfy 
the Normal distribution, with parameters mean /i 
and standard deviation 5 as 864 (three days) and 
288 (one day) respectively. After requests are gen¬ 
erated in this way, we start the simulator to sim¬ 
ulate the scheduling effects of different algorithms 
and comparison results are collected. For collect¬ 
ing data, we firstly fix the k value of Prepartition 
algorithm as 4; different types of VMs wit equal 
probabilities. Then we change the VMs numbers 
from 100, 200, 400 and 1600 to trace the tendency. 
Each set of data is the average values of 10-runs. 

Fig.5 to Fig.6 show the average utilization, 
makespan and capacity_makespan comparison of 
different algorithms respectively. From these 
figures, we can notice that Prepartition al¬ 
gorithm has 10%-20% higher average utiliza¬ 
tion than PMG and LPT, and 40%-50% higher 
average utilization than Random-Robin (RR); 
Prepartition algorithm has 8%-13% lower average 
makespan and capacity .makespan than PMG and 
LPT, and 40%-50% lower average makespan and 
capacity .makespan than Random-Robin (RR). 
We can also notice that the PMG algorithm can 
improve the performance of LPT algorithm. LPT 
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Figure 5: The offline algorithm comparison of average utilization (a) and imbalance degree (b) with Normal distribution 
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(d) Capacity_makespan Comparison 
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Figure 6: The offline algorithm comparison of makespan (c) and capacity_makespan (d) with Normal distribution 


algorithm is better than R-R algorithm. Sim¬ 
ilar results are observed for the comparison of 
makespan. The performance improvement of 
MIG algorithm is obtained from the extra mi¬ 
gration operations. The VM migration enables 
a better load balance. 


5.2. Online Prepartition Algorithm 

In this part, we will present the simulation 
results between PrepartitionOn algorithm and 
other three existed algorithms. Random, Round- 
Robin, Online Resource Scheduling Algorithm 
(OLRSA) [24] and PrepartitionOn Algorithm are 
implemented to compare: 

1) Random Algorithm: a scheduling algorithm 
that randomly allocates the requests to a PM 
that can provide required resource. 

2) Round-Robin Algorithm(R-R): a traditional 
load balancing scheduling algorithm by allocating 
the VM requests in turn to each PM that can 
provide required resource. 

3) OLRSA algorithm: an online scheduling 
algorithm, it computes the capacity_makespan of 
each PM and sort the PM by capacity .makespan 
in descending order. This algorithm always 
allocates the request to the PM with the lowest 
capacity .makespan and required resource. 


5.2.1. Replay with LLNL Data Trace 

For realistic data, we utilize the log data at 
Lawrence Livermore National Lab (LLNL) [26] be¬ 
cause the data is suitable for our research prob¬ 
lem. Fig. 7 to Fig. 8 illustrate the compar¬ 
isons of the average utilization, imbalance degree, 
makespan, capacity .makespan. From these fig¬ 
ures, we can notice that PrepartitionOn shows 
the highest average utilization, lowest imbalance 
degree, and lowest makespan. As for capac¬ 
ity .makespan, OLRSA has been proved much 
better performance compared with random and 
round-robin algorithms, and PrepartitionOn still 
improves 10%-15% in average utilization, 20%- 
30% in imbalance degree, and 5% to 20% in 
makespan than OLRSA. 

5.2.2. Results Comparison by Synthetic Data 

We set 5 minutes as a slot, so 12 slots are for an 

hour, 288 slots are for a day. All requests satisfy 
the Normal distribution, with parameters mean 
/i and standard deviation 5 as 864 (3 days) and 
288 (1 day). We set that different types of VMs 
have equal probabilities, then we change the re¬ 
quests generation approach to produce different 
size of requests to trace the tendency. From Fig. 
9 to 10, we can see that PrepartitionOn has bet¬ 
ter performance in average utilization, imbalance 
degree, makespan and capacity .makespan. Com¬ 
paring to OLRSA, PrepartitionOn still improves 
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Figure 7: The online algorithm comparison of average utilization (a) and imbalance degree (b) with LLNL trace 
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Figure 8: The online algorithm comparison of makespan (c) and capacity_makespan (d) with LLNL trace 



Figure 9: The online algorithm comparison of average utilization (a) and imbalance degree (b) with Normal distribution 
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Figure 10: The online algorithm comparison of makespan (c) and capacity_makespan (d) with Normal distribution 


about 10% in average utilization, 30%-40% in im¬ 
balance degree, 10%-20% in makespan, as well as 
10%-20% in capacity_makespan. 

It is apparent that large k values may bring side 
effects since it will need more number of parti¬ 


tions. In Fig. 11, we compare the time costs (sim¬ 
ulated with LLNL data and the time unit is mini 
second) under different partition value /c, Prepar- 
titionOn algorithm with k = 3 takes about 10% 
less running time than that with k= 4, and k = 2 
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takes 15% less running time than that with k = 4. 
It is easy to understand that a larger k value will 
produce a better load balance with longer process 
time. We also observe that larger k value will 
induce a lower capacity_makespan value. Simi¬ 
larly, with a larger k value, larger average utiliza¬ 
tion, lower imbalance degree and makespan are 
obtained. 
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Figure 11: The comparison of time costs for Preparti- 
tionOn by varying k values 


6. conclusion 

In this paper, to reflect the feature of capac¬ 
ity sharing and fixed interval constraint of VM 
scheduling in Cloud data centers, we propose new 
offline and online load balancing algorithms. The¬ 
oretically we prove that offline Prepartition is a 
(l-be)-approximation where e=^ and k is a pos¬ 
itive integer. By increasing k it is possible to 
be very close to optimal solution, i.e., by set¬ 
ting k value, it is also possible to achieve pre¬ 
defined load balance goal as desired because of¬ 
fline Prepartition is a (1+^-approximation and 
online Prepartition (PrepartitionOn) has compet¬ 
itive ratio (1 + f Both synthetic and 

trace driven simulations have validated theoret¬ 
ical observations and shown Prepartition algo¬ 
rithm has better performance than a few existing 
algorithms at average utilization, imbalance de¬ 
gree, makespan, and capacity_makespan both for 
offline and online algorithms. There are still a few 
research issues can be considered: 

• making suitable choice between total par¬ 
tition numbers and load balance objective. 
Prepartition algorithm can achieve desired 
load balance objective by setting suitable k 
value. It may need large number of parti¬ 
tions so that the number of migrations can 
be large depending on the characteristics of 
VM requests. For example in EC2 m, the 
duration of VM reservations varies from a 
few hours to a few months, we can classify 


different types of VMs based on their dura¬ 
tions (capacity_makespans) firstly, then ap¬ 
plying Prepartition will not have large par¬ 
tition number for each type. In practice we 
need analyzing traffic patterns to make the 
number of partitions (premigrations) reason¬ 
able so that the total costs, including running 
time and migrations, are not very high. 

• considering heterogeneous configuration of 
PMs and VMs. We mainly consider that 
a VM requires a portion of total capacity 
from a PM. This is also applied in EC2 and 
Knauth et al. m- When this is not true, 
multi-dimensional resources such as CPU, 
memory and bandwidth etc. have to be con¬ 
sidered together or separately in the load bal¬ 
ance, see HU and HU for a detailed discus¬ 
sion about considering multi-dimensional re¬ 
sources. 

• Considering precedence constraints among 
different VM requests. In reality, some of VM 
reservations may be more important than 
others, we should extend current algorithm 
to consider this case. 
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